Pattern Matching in Trees and Strings
نویسنده
چکیده
We study the design of efficient algorithms for combinatorial pattern matching. More concretely, we study algorithms for tree matching, string matching, and string matching in compressed texts. Tree Matching Survey We begin with a survey on tree matching problems for labeled trees based on deleting, inserting, and relabeling nodes. We review the known results for the tree edit distance problem, the tree alignment distance problem, and the tree inclusion problem. The survey covers both ordered and unordered trees. For each of the problems we present one or more of the central algorithms for each of the problems in detail. Tree Inclusion Given rooted, ordered, and labeled trees P and T the tree inclusion problem is to determine if P can be obtained from T by deleting nodes in T . We show that the tree inclusion problem can be solved in O(nT ) space with the following running times: min O(lPnT ), O(nP lT log lognT + nT ), O( nP nT lognT + nT lognT ). Here nS and lS denotes the number of nodes and leaves in tree S ∈ {P, T }, respectively, and we assume that nP ≤ nT . Our results matches or improves the previous time complexities while using only O(nT ) space. All previous algorithms required Ω(nPnT ) space in worst-case. Tree Path Subsequence Given rooted and labeled trees P and T the tree path subsequence problem is to report which paths in P are subsequences of which paths in T . Here a path begins at the root and ends at a leaf. We show that the tree path subsequence problem can be solved in O(nT ) space with the following running times: min O(lPnT + nP ), O(nP lT + nT ), O( nP nT lognT + nT + nP lognP ). As our results for the tree inclusion problem this matches or improves the previous time complexities while using only O(nT ) space. All previous algorithms required Ω(nPnT ) space in worst-case. Regular Expression Matching Using the Four Russian Technique Given a regular expressionR and a string Q the regular expression matching problem is to determine if Q matches any of the strings specified by R. We give an algorithm for regular expression matching using O(nm/ logn + n + m logm) and O(n) space, where m and n are the lengths of R and Q, respectively. This matches the running time of the fastest known algorithm for the problem while improving the space from O(nm/ logn) to O(n). Our algorithm is based on the Four Russian Technique. We extend our ideas to improve the results for the approximate regular expression matching problem, the string edit distance problem, and the subsequence indexing problem.
منابع مشابه
Parameterized matching on non-linear structures
The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set Σ. In the parameterized pattern matching model, a consistent renaming of symbols from Σ is allowed in a match. The parameterized matching paradigm has proven useful in problems in software engineering, computer vision, and other applications. In clas...
متن کاملIndexes for Jumbled Pattern Matching in Strings, Trees and Graphs
We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.
متن کاملBinary Jumbled Pattern Matching via All-Pairs Shortest Paths
In binary jumbled pattern matching we wish to preprocess a binary string S in order to answer queries (i, j) which ask for a substring of S that is of size i and has exactly j 1-bits. The problem naturally generalizes to node-labeled trees and graphs by replacing “substring” with “connected subgraph”. In this paper, we give an n/2 n/ log log n) 1/2 time solution for both strings and trees. This...
متن کاملAlgorithmics on SLP-compressed strings: A survey
Results on algorithmic problems on strings that are given in a compressed form via straightline programs are surveyed. A straight-line program is a context-free grammar that generates exactly one string. In this way, exponential compression rates can be achieved. Among others, we study pattern matching for compressed strings, membership problems for compressed strings in various kinds of formal...
متن کاملA Novel Data Structure for String Matching Applicable in Network Processing
We address prefix matching problems which constitute the building block of some applications in the computer realm and related area. It is assumed there are strings of an alphabet Σ which are ordered. The data strings can have different lengths and some of them can be prefixes of others. A well known application of prefix matching is layer 3 IP switching in which routers forward an IP packet by...
متن کاملAbelian pattern matching in strings
Abelian pattern matching is a new class of pattern matching problems. In abelian patterns, the order of the characters in the substrings does not matter, e.g. the strings abbc and babc represent the same abelian pattern a+2b+c. Therefore, unlike classical pattern matching, we do not look for an exact (ordered) occurrence of a substring, rather the aim here is to find any permutation of a given ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0708.4288 شماره
صفحات -
تاریخ انتشار 2007